Prediction-based regularization using data augmented regression

نویسندگان

  • Giles Hooker
  • Saharon Rosset
چکیده

The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with monte-carlo data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and nonparametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extreme Learning Machine for Graph Signal Processing

In this article, we improve extreme learning machines for regression tasks using a graph signal processing based regularization. We assume that the target signal for prediction or regression is a graph signal. With this assumption, we use the regularization to enforce that the output of an extreme learning machine is smooth over a given graph. Simulation results with real data confirm that such...

متن کامل

A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization

In this paper, we tackle air quality forecasting by using machine learning approaches to 1 predict the hourly concentration of air pollutants (e.g., Ozone, PM2.5 and Sulfur Dioxide). Machine 2 learning, as one of the most popular techniques, is able to efficiently train a model on big data by using 3 large-scale optimization algorithms. Although there exists some works applying machine learning...

متن کامل

Theoretical and Experimental Analyses of Tensor-Based Regression and Classification

We theoretically and experimentally investigate tensor-based regression and classification. Our focus is regularization with various tensor norms, including the overlapped trace norm, the latent trace norm, and the scaled latent trace norm. We first give dual optimization methods using the alternating direction method of multipliers, which is computationally efficient when the number of trainin...

متن کامل

Using Subspace Multiple Linear Regression for 3D Face Shape Prediction from a Single Image

In this paper, we compare four different Subspace Multiple Linear Regression methods for 3D face shape prediction from a single 2D intensity image. This problem is situated within the low observation-to-variable ratio context, where the sample covariance matrix is likely to be singular. Lately, efforts have been directed towards latent-variable based methods to estimate a regression operator wh...

متن کامل

Online Linear Regression using Burg Entropy

We consider the problem of online prediction with a linear model. In contrast to existing work in online regression, which regularizes based on squared loss or KL-divergence, we regularize using divergences arising from the Burg entropy. We demonstrate regret bounds for our resulting online gradient-descent algorithm; to our knowledge, these are the first online bounds involving Burg entropy. W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistics and Computing

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2012